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Abstract 

Particle filters are broadly used to approximate posterior distributions of hidden states in state-space models by means of sets 
of weighted particles. While the convergence of the filter is guaranteed when the number of particles tends to infinity, the quality 
of the approximation is usually unknown but strongly dependent on the number of particles. In this paper, we propose a novel 
method for assessing the convergence of particle filters online manner, as well as a simple scheme for the online adaptation of the 
number of particles based on the convergence assessment. The method is based on a sequential comparison between the actual 
observations and their predictive probability distributions approximated by the filter. We provide a rigorous theoretical analysis of 
the proposed methodology and, as an example of its practical use, we present simulations of a simple algorithm for the dynamic 
and online adaption of the number of particles during the operation of a particle filter on a stochastic version of the Lorenz system. 


Index Terms 

Particle filtering, sequential Monte Carlo, convergence assessment, predictive distribution, convergence analysis, computational 
complexity, adaptive complexity. 


I. Introduction 

Many problems in science and engineering can be described by dynamical models where hidden states of the systems change 
over time and observations that are functions of the states are available. Often, the observations are sequentially acquired and 
the interest is in making recursive inference on the hidden states. In many applications, the Bayesian approach to the problem 
is adopted because it allows for optimal inclusion of prior knowledge of the unknown state in the estimation process m, El. 
In this case, the prior information and the likelihood function that relates the hidden state and the observation are combined 
yielding a posterior distribution of the state. 

Exact Bayesian inference, however, is only possible in a small number of scenarios, including linear Gaussian state-space 
models (using the Kalman filter El^ 131) and finite state-space hidden Markov models (HMM filters Q). Therefore, in many 
other practical problems, only approximate inference methods can be used. One class of suboptimal methods is particle filtering, 
which is also known as sequential Monte Carlo sampling 0,0, HI, ii, cni. Since the publication of mi, where the sampling 
importance resampling (SIR) filter was introduced, particle filtering has received outstanding attention in research and practice. 
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Particle filters approximate posterior distributions of the hidden states sequentially and recursively. They do it by exploiting 
the principle of importance sampling and by using sets of weighted particles 0, Q, Cl- 

The key parameter of particle filters is the number of particles. It can be proved that the rate of convergence of the approximate 
probability distribution towards the true posterior is inversely proportional to the square root of the number of particles used 
in the filter ifTSll . ifT^ . This, too, entails that the filter “perfectly” approximates the posterior distribution when the number of 
particles tends to infinity. However, since the computational cost grows with the number of particles, practitioners must choose 
a specific number of particles in the design of their filters. 

In many applications, the observations arrive sequentially, and there is a strict deadline for processing each new observation. 
Then, one could argue that the best solution in terms of filter performance is to increase the number of particles as much as 
possible and keep it fixed. Also, in some hardware implementations, the number of particles is a design parameter that cannot 
be modified during implementation. Nevertheless, in many other applications where resources are scarce or are shared with a 
dynamical allocation and/or with energy restrictions, one might be interested in adapting the number of particles in a smart 
way. One would use enough particles to achieve a certain performance requirement but without wasting resources by using 
many more particles if they do not translate into a significant improvement of the filter performance. 

The selection of the number of particles, however, is often a delicate subject because, (1) the performance of the filter (the 
quality of the approximation) cannot usually be described in advance as a function of the number of particles, and (2) the 
mismatch between the approximation provided by the filter and the unknown posterior distribution is obviously also unknown. 
Therefore, although there is a clear trade-off between performance and computational cost, this relation is not straightforward; 
e.g., increasing the number of particles over a certain value may not significantly improve the quality of the approximation 
while decreasing the number of particles below some other value can dramatically affect the performance of the filter. 

Few papers in the wide literature have addressed the problem of online assessment of the filter convergence for the purpose 
of adapting the number of particles. In iflTll . the number of particles is selected so that the bound on the approximation error 
does not exceed a threshold with certain probability. The latter error is defined as the Kullback-Leibler divergence (KLD) 
between the approximate filter distribution and a grid-discretized version of the true one (which is itself a potentially-costly 
approximation with an unknown error). In ca, an adaptation of the number of particles is proposed, based on the KLD 
approach of in and an estimate of the variance of the estimators computed via the particle filter, along with an improvement 
of the proposal distributions. In m . the adaptation of the number of particles is based on the effective sample size. To our 
best knowledge, all existing methods are heuristic: they do not enjoy any theoretical guarantees (in the assessment of the 
approximation errors made by the particle filter) and the allocation of particles, therefore, cannot be ensured to be optimal 
according to any probabilistic criterion. 

In this paper, we introduce a model-independent methodology for the online assessment of the convergence of particle filters 
and carry out a rigorous analysis that ensures the consistency of the proposed scheme under fairly standard assumptions. The 
method is an extension of our previous work presented in El. In the proposed scheme, the observations are processed one 
at a time and the filter performance is assessed by measuring the discrepancy between the actual observation at each time 
step and a number of fictitious data-points drawn from the particle approximation of the predictive probability distribution 
of the observations. The method can be exploited to adjust the number of particles dynamically when the performance of 
the filter degrades below a certain desired level. This would allow a practitioner to select the operation point by considering 
performance-computational cost tradeoffs. Based on the method, we propose a simple and efficient algorithm that adjusts the 
number of particles in real time. We demonstrate the performance of the algorithm numerically by running it for a stochastic 
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version of the 3-dimensional Lorenz 63 system. 

Let us point out that the adaptive procedure for the online selection of the number of particles described herein is only one 
of many that can exploit the results of the convergence analysis. In other words, our analysis opens the door for development 
of new family of algorithms for online adaptation of the number of particles by way of online convergence assessment. 

The rest of the paper is organized as follows. In Section]^ we describe the class of state space Markov models and provide 
a basic background on the well-known bootstrap particle filter of im. The theoretical results that enable the online assessment 
of particle filters are stated in Section with full details and proofs contained in Appendix The proposed methodology 
for online convergence assessment of the particle filter is introduced in Section]^ Furthermore, this section provides a simple 
algorithm for the dynamic, online adaptation of the number of particles. In Section |V] we illustrate the validity of the method 
by means of computer simulations for a stochastic Lorenz 63 model. Finally, Section |Vl] contains a summary of results and 
some concluding remarks. 


11. Particle filtering 

In this section we describe the class of state space models of interest and then present the standard particle filter (PF), which 
is the basic building block for the methods to be introduced later. 

A. State space models and stochastic filtering 

Let us consider discrete-time, Markov dynamic systems in state-space form described by the triple^ 

Zo - p{xo), 

Xt p{xt\xt-i), 

Yt - pCftki), 

where 

• t S N denotes discrete time; 

• Xt is the dx x 1-dimensional (random) system state at time t, which takes variables in the set X C 

• p{xo) is the a priori pdf of the state, while 

• p{xt\xt-i) denotes the conditional density of the state Xt given Xt-i =Xt-i', 

• Yt is the dy x 1-dimensional observation vector at time t, which takes values in the set y C and is assumed to be 
conditionally independent of all other observations given the state Xt, 

• p{yt\xt) is the conditional pdf of Yt given Xt = Xt- It is often referred to as the likelihood of Xt, when it is viewed as a 
function of Xt given 

The model described by Eqs. Q-([^ includes a broad class of systems, both linear and nonlinear, with Gaussian or non- 
Gaussian perturbations. Here we focus on the case where all the model parameters are known. However, the proposed method 
can also be used for models with unknown parameters for which suitable particle filtering methods are available m, m, 
EOl . We assume that the prior distribution of the state p{xq) is also known. 

*In most of the paper we abide by a simplified notation where p{x) denotes the probability density function (pdf) of the random variables X. This notation 
is argument-wise, hence if we have two random variables X and Y, then p{x) and p{y) denote the corresponding density functions, possibly different; p{x, y) 
denotes the joint pdf and p(x\y) is the conditional pdf of X given Y = y. A more accurate notation, which avoids any ambiguities, is used for the analysis 
and the statement of the theoretical results. Vectors are denoted by bold-face letters, e.g., x, while regular-face is used for scalars, e.g., x. 


( 1 ) 

( 2 ) 

( 3 ) 
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The stochastic filtering problem consists in the computation of the sequence of posterior probability distributions given by 
the so-called ^/fenng densities p{xt\yl.^), f = 1,2, • • •. The pdf t) closely related to the one-step-ahead predictive 

state density which is of major interest in many applications and can be written down by way of the Chapman- 

Kolmogorov equation, 

Pixt\yi-.t-i) = J p{xt\xt-i)pixt-i\yi.,t-i)dxt-i- (4) 

Using Bayes’ theorem together with Eq. we obtain the well-known recursive factorization of the filtering pdf 

ocp{yt\xt) J p{xt\xt-i)p{xt-i\yi,t_i)dxt-i. 

For conciseness and notational accuracy, we use the measure-theoretic notation 

TTt{dxt) := p{xt\yi,t)dxt, it{dxt) := p{xt\yi.t-i)dxt 


to represent the filtering and the predictive posterior probability distributions of the state, respectively. Note that ttj and 
are probability measures, hence, given a Borel set A C A', 7rt(A) = J^Tr(dxt) and ^tiA) = J^^^tidxt) denote the posterior 
probability of the event Xt G A conditional on Yi-t =3'i.( and Ti:t-i =yi.t-i, respectively. 

However, the object of main interest for the convergence assessment method to be introduced in this paper is the predictive 
pdf of the observations, namely the function p(yt\yi-t-i) the associated probability measure 

Pt{dyt) ■.= p(yt\yi,t-i)dyt. 

The density pCVtlTi t-i) '^he normalization constant of the filtering density p{xt\yi.t), and it is related to the predictive state 
pdf p{xt\yi.i_i) through the integral 

ph/t\yi:t-i) = J p(yt\xt)pixt\yi,t-i)dxt. (5) 

It also plays a key role in model assessment lEl and model inference problems mi, Eol, ED. 


B. The standard particle filter 

A PF is an algorithm that processes the observations sequentially in order to compute Monte Carlo approximations 

of the sequence of probability measures {TTt]t>i- The simplest algorithm is the so-called bootstrap filter (BF) ifTTl (see also 
EH), which consists of a recursive importance sampling procedure and a resampling step. The term “particle” refers to a 
Monte Carlo sample in the state space X, which is assigned an importance weight. Below, we outline the BF algorithm with 
M particles. 


Algorithm 1 . Bootstrap filter. 

1) Initialization. At time t — 0, draw M i.i.d. samples, x^\ m = 1,..., M, from the prior p{xo). 

2) Recursive step. Let particles at time t — 1. At time t, proceed with the two steps below. 

a) For m = 1,...,M, draw from the model transition pdf p{xt\x[^l). Then compute the normalized importance 


weights 


(m) 

w; = 


pjytlxt ) 


TO = 1,..., M. 


(6) 


b) Resample M times with replacement: for to = 1,..., M, let with probability w^\ where k G {1,..., M}. 
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For the sake of simplicity, in step 2.(b) above we assume that multinomial resampling Q is carried out for every t > 1. The 
results and methods to be presented in subsequent sections remain valid when resampling is carried out periodically and/or 
using alternative schemes such as residual la, stratihed or minimum-variance UtM resampling (see also my 

The simple BF yields several useful approximations. After sampling at step 2.(a), the predictive state probability measure 
can be approximated as 


1 


M 




m—1 


where denotes the Dirac delta measure located at jr € A". The filter measure tt^ can be similarly approximated, either using 
the particles and weights computed at step 2.(a) or the resampled particles after step 2.(b), i.e.. 


M 


-M 


= E 


(™) JT 


and TT. 


M 




M 


M 


(m), 


m—1 m—1 

respectively. In addition, the BF yields natural approximations of the predictive pdf’s of Xt and Yt given the earlier observations 
Yi-t-i =yl.^_l■ If we specifically denote these functions as pt{xt) := p{xt\yi..i._i) and ptiji) •= F()'il3'i t-i)> then we readily 
obtain their respective estimates as mixture distributions with M mixands, or, 


M 


:= '^w^_^p{xt\x):z\), 


and 


m—1 


M 


pfiyt 


■= 




for any Xt G X and € y. 


III. A NOVEL ASYMPTOTIC CONVERGENCE RESULT 


The convergence of the approximate measures, e.g., , towards the true ones is usually assessed in terms of the estimates 

of 1-dimensional statistics of the corresponding probability distribution. To be specific, let / : /F —K be a real integrable 
function in the state space and denot^ 

if, it) ■■= j f{xt)it{dxt). 


Under mild assumptions on the state space model, it can be proved that 


lim if,if) = lim ^ fixf^) = if, it) 


M 


M—yoo 


M—yoo M 


(7) 


almost surely (a.s.) Il26ll . ifT^ . 

According to the predictive observation pdf ptiyt) is an integral w.r.t. and, as a consequence, Eq. Q implies that 
pf iy) = Ptiy) a.s. and point-wise for every y G y under mild assumptions ll26l . However, existing theoretical 
results do not ensure that pf iy) can converge uniformly on Y towards ptiy) and this fact prevents us from claiming that 
limM->.oo / h{y)pf {y)dy = / h(y)ptiy)dy = {h^pt) in some proper sense for integrable real functions h{y). 

The first contribution of this paper is to prove that, under mild regularity assumptions on the state space model, the continuous 
random probability measure 


pf{dy) :=pf(y)dy 


converges a.s. to fit and provide explicit error rates. To express this result rigorously, we need to introduce some notation: 

^ Let be a measurable space, where Z C for some integer d > 1 and is the Borel cr-algebra of subsets of Z. If q is a measure on 

B{Z) and the function h : Z —y M. is integrable with respect to (w.r.t.) a, then we use the shorthand notation (/, a) := f f(z)a(dz). 
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• For each t > 1, let us define the function gt{y^,Xt) := p(y^\xt), i.e., the conditional pdf ofj^ given Xt- When this function 

is used as a likelihood, we write (fi{xt) '■= gt{yt,Xt) to emphasize that it is a function of Xt- 

• Let / ; Z —> M be a real function on some set Z. We denote the absolute supremum of / as ||/||oo := sup^g^ l/(2)l- 

The set of bounded real functions on Z is B{Z) := {/ : Z —>■ K such that ||/||oo < oo}. 

• Let a = {ai,...,ad) be a multi-index, where each a^, i = is a non-negative integer. Let / ; Z —>■ K be a 

real function on a d-dimensional set Z C We use D“f{z) to denote the partial derivative of / w.r.t. the variable z 

determined by the entries of a, namely. 


Dlfiz) 


gai . . . gadj 

■ ■ ■ dzy 


The order of the derivative operator is |a| = J2i=i 
• The minimum out of two scalar quantities, a, 6 G K, is denoted a Ab. 

We make the following assumptions on the likelihood function gt and the predictive observation measure gt{dyt) = Ptiyt)dyf 
(£) For each f > 1, the function gt is positive and bounded, i.e., gt(y,x) > 0 for any G y x X and 

lldtiloo = sup(y_^)gyxjf \gt(y,x)\ < oo. 

(S)) For each f > 1, the function gt(y,x) is differentiable with respect to y, with bounded derivatives up to order dy, i.e., 
Dlgtiy,x) = oS^^(yx) exists and 


\\Dlgt\\oc = sup \Dlgt(y,x)\ < oo. 
(y,x)eyxx 


(£) For any 0</?< 1 and any p > 4, the sequence of hypercubes 


C, 


M ■ — 


r & 

M p 

1 

M p 

X ■ 

• X 

r ^ 

M p 

^ 1 

M p 

2 

' 2 

2 

' 2 


c 


satisfies the inequality /it(C m) < bM ^ for some constants b > 0 and 77 > 0 independent of M (yet possibly dependent 
on P and p), where Cm = K‘^’'\Cm is the complement of Cm- 


Remark 1 . Assumptions (£) and (Dj refer to regularity conditions (differentiability and boundedness) that the likelihood 
function of the state space model should satisfy. Models of observations, for example, of the form y^ = f{xt) +Ut, where f 
is a (possibly nonlinear) transformation of the state Xt and Ut is noise with some differentiable, exponential-type pdf (e.g., 
Gaussian or mixture-Gaussian), readily satisfy these assumptions. Typical two-sided heavy-tailed distributions, such as Student’s 
t distribution, also satisfy (£) and CD). 

Assumption (€) states that the tails of the pdf ptCyf) = p()'tl3'i t-i) should not be too heavy. Being polynomial on M, 
this constraint is relatively weak and the assumption is satisfied for all exponential-type distributions as well as for many 
heavy-tailed distributions. For example, when dy = 1, one can choose the constants b and g such that bM~^ is an upper 
bound for the tails of the (heavy-tailed) Pareto, Weibull, Burr or Levy distributions. 


Theorem 1. Assume that (£), (D) and (€) hold and the observations yi.^_i are fixed (and otherwise arbitrary). Then, for 
every h G B{y) and any e G (0, there exists an a.s. finite r.v. Wf independent of M, such that 

In particular, 

lim (h, p,^) = (h, p,t) a.s. 


See Appendix for a proof. 
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IV. Online selection of the number of particles 

In the sequel we assume scalar observations, hence dy = 1 and y^ = yt (while da; > 1 is arbitrary). A discussion of how to 
proceed when dj, > 1 is provided in Section IV-E 


Our goal is to evaluate the convergence of the BF (namely, the accuracy of the approximation {yt)) in real time and, 
based on the convergence assessment, adapt the computational effort of the algorithm, i.e., the number of used particles M. 

To that end, we run the BF in the usual way with a light addition of computations. At each iteration we generate K “fictitious 
observations”, denoted ..., from the approximate predictive pdf {yt). If the BF is operating with a small enough 
level of error, then Theorem [T] states that these fictitious observations come approximately from the same distribution as the 
acquired observation, i.e., y^(dyt) « yt(dyt). In that case, as we explain in Subsection ' 


IV-B 


a statistic af" can be constructed 


',( 1 ) 
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using yt,yf', ..., y^ a which necessarily has an (approximately) uniform distribution independently of the specific form of 
the state-space model ([T])-(|^. By collecting a sequence of such statistics, say ..., for some window size W, one 

can easily test whether their empirical distribution is close to uniform using standard procedures. The better the approximation 
« pt generated by the BF, the better fit with the uniform distribution can be expected. 

If AT << M and W is not too large, the cost of the added computations is negligible compared to the cost of running the 
BF with M particles and, as we numerically show in Section |V] the ability to adapt the number of particles online leads to a 
very significant reduction of the running times without compromising the estimation accuracy. 

Below we describe the method, justify its theoretical validity and discuss its computational complexity as well as its extension 
to the case of multidimensional y^’s. 


A. Generation of fictitious observations 

The proposed method demands at each time t the generation of K fictitious observations (i.e., Monte Carlo samples), denoted 
from the approximate predictive observation pdf pf^{yt) = Sm=i Since the latter density is a finite 
mixture, drawing from pf^{yt) is straightforward as long as the conditional density of the observations, p{yt\xt), is itself 
amenable to sampling. In order to generate yl^\ it is enough to draw a sample from the discrete uniform distribution on 
{1, 2,..., M} and then generate y^^ ^ p{yt\x[^ ^). 


B. Assessing convergence via invariant statistics 

For simplicity, let us assume first that pf^{yt) = Pt{yt) = p{yt\yi-.t-i), i-e-j there is no approximation error and, 
therefore, the fictitious observations have the same distribution as the true observation yt. We define the set 

AK,t '■= {y S : y < yt} and the r.v. AK,t ■= \A-K,t\ € {0,1,..., AT}. Note that AK,t is the set of fictitious 

observations which are smaller than the actual one, while Ax^t is the number of such observations. If we let denote the 
probability mass function (pmf) of Ax, it is not hard to show that Qx is uniform independently of the value and distribution 
of yt. This is rigorously given by the Proposition below. 

Proposition 1. If yt,y\^\... are i.i.d. samples from a common continuous (but otherwise arbitrary) probability 

distribution, then the pmf of the r.v. Ax,t A 

QK{n) = j^, n = 0,...,K. (8) 

Proof: Since yt,y[^\--- ,y[^'’ are i.i.d., all possible orderings of the AT -f 1 samples are a priori equally probable, and 
the value of the r.v. Ax,t depends uniquely on the relative position of yt after the samples are sorted (e.g., if yt is the 





smallest sample, then AK,t — 0, if there is exactly one < yt then AK,t = 1, etc.)- There are {K + 1)! different ways 
in which the samples yt,y^\ ■ ■ ■ can be ordered, but A^^t can only take values from 0 to K. In particular, given the 

relative position of yt, there are K\ different ways in which the remaining samples y['^\ ■ ■ ■ can be arranged. Therefore, 

Qk{Ak = n) = for every n S {0,1,..., K}. □ 

In practice, {yt) is just an approximation of the predictive observation pdf Pt{yt) ™d, therefore, the actual and hctitious 
observations are not i.i.d. However, under the assumptions of Theorem [T] the a.s. convergence of the approximate measure 
(dyt) = Pt^{yt)dyt enables us to obtain an “approximate version” of the uniform distribution in Proposition with the 
error vanishing as M —>■ oo. To be specihc, we introduce the set AK,M,t '■= {y G {yt^'^}k=i • y < ?/*}’ which depends on M 
because of the mismatch between p^{yt) and pt{yt), and the associated r.v. Ax^M,t = with pmf QK,M,t- We have 

the following convergence result for QK,M,t- 


Theorem 2. Let yt be a sample from Pt{yt) ond let {y^^}k=i i-i.d. samples from p^{yt). If the observations yi-,t-i 
are fixed and Assumptions (£,), (ID) and (€) hold, then there exists a sequence of non-negative r.v.’s {e^ljvreN such that 
limM->-oo = 0 a.s. and 


1 

iT+ 1 


- < QK,M,t{n) < 


1 

K + 1 


+ er. 


( 9 ) 


In particular, liniM-i-oo = 7 ^ a.s. 

See Appendix for a proof. Proposition states that the statistic AK,t is distribution-invariant, since Q/f (n) = 
independently of t and the state space model. Similarly, Theorem implies that the statistic AK,M,t is asymptotically 
distribution-invariant (independently of t and the model) since QK,M,t{n) when M ^ 00 , as the BF convergesFl 


C. BF algorithm with adaptive number of particles 


We propose an algorithm that dynamically adjusts the number of particles of the hlter based on the transformed r.v. Ax,M,t- 
Table |n] summarizes the proposed algorithm, that is embedded into a standard BF (see Section II-B 1 but can be applied to 
virtually any other particle hlter in a straightforward manner. The parameters of the algorithm are shown in Table |I] 

The BF is initialized in Step 1(a) with Mq initial particles. At each recursion, in Step 2(a), the hltered distribution 

of the current state is approximated. In Step 2(b), K hctitious observations are drawn and the statistic 

Ak,m,-i — a,K,M,t is computed. In Step 2 (b), once a set of W consecutive statistics have been acquired, St = 

{aK,M,t-w-viTa,K,M,t-w-\- 2 , a statistical test is performed for checking whether St is a sequence of 


i.i.d. samples from the pmf given by Eq. ©■ 

There are several approaches that can be used to exploit the information contained in St. Here we perform a Pearson’s 
chi-squared test Ezl, where the Xt statistic is computed according to Eq. ([Tgi (see Table |^. Then, a p-value p’^ ^ for testing 
the hypothesis that the empirical distribution of St is uniform is computed. The value ^ is obtained by comparing the Xt 
statistic with the distribution with K degrees of freedom. Intuitively, a large p*j^ ^ suggests a good match of the sequence 
St with an i.i.d. sample from the uniform distribution on {0,1,..., AT}, while a small p*j^ ^ indicates a mismatch. Therefore, 
the p-value p’^ ^ is compared with two different signihcance levels: a low threshold pi and a high threshold p^. If p^ t S Pi, 
the number of particles is increased according to the rule Mt = fup{Mt-i) whereas, if > ph, the number of particles 
is decreased according to the rule Mt = /down(ATt-i)- When pi < p^j < ph, the number of particles remains fixed. These 


^Specifically note that, under assumptions (£), (2)) and (£), the convergence of the continuous random measure computed via the BF (which is 
sufficient to obtain see Appendix [Bj is guaranteed by Theorem^ 
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TABLE I; Parameters of the algorithm 


- Mq, initial number of particles 

- ALniii> minimum number of particles 

- Afmax, maximum number of particles 

- K, number of fictitious samples per iteration 

- W, window length 

- p£, lower significance level of p-values 

- Ph, higher significance level of p-values 

- /up(-)> rule for increasing M 

- /dowii(')’ rule for decreasing M 


two significance levels allow the practitioner to select the operation range by considering a performance-to-computational-cost 
tradeoff. Note that we set and maximum and minimum values for the number of particles, respectively. 

A large window W yields a more accurate convergence assessment but increases the latency (or decreases the responsiveness) 
of the algorithm. If the algorithm must be run online, this latency can be critical for detecting a malfunction of the filter and 
adapting consequently the number of particles. Therefore there is a tradeoff between the accuracy of the convergence assessment 
procedure and latency of the algorithm. 


D. Computational cost 

Compared to the BE, the additional computational cost of the method is mainly driven by the generation of the K fictitious 


observations at each iteration as shown in Subsection IV-A The generation of these fictitious observations is a two-step 
procedure, where in the first step, we draw K discrete indices, say ji, ...jjK, from the set {1,..., M„} with uniform probabilities, 
and in the second step, we draw K samples from respectively. 

In the proposed algorithm, a Pearson’s test is performed with a sequence St of W samples, that is, it is carried out only 
once every W consecutive time steps. Therefore, the computational cost will depend on the parameters K and W. We will 
show in Section |V] that the algorithm can work very well with a low number of fictitious observations, which imposes a very 
light extra computational load. 


E. Multidimensional observations 

Through this section, we have assumed scalar observations. In the multidimensional case, with = [yi ..., 
the same assessment scheme can be applied over each marginal p{yi,t\yi-t-i) of the predictive observation pdf. Theoretical 
guarantees readily follow from the convergence of the marginal measures y,f{{dyi^t) = {yi,t\yi-t-i)dyi,t under the same 
assumptions as the joint measure (see Appendix [a|. 

Note that the convergence of the marginals does not imply the convergence of the joint approximation However, it can 
be reasonably expected that when all marginals are approximated well over a period of time, the joint distribution is accurately 
approximated as well. 
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TABLE II; Algorithm for adapting the number of particles 


1) [Initialization] 

a) Initialize the particles and the weights of the filter as 


= 1/Mo, 


m = 1,..., Mq , 
m = 1,..., Mq, 


and set n = 1. 

2) [For t = l:T] 

a) Bootstrap particle filter: 

- Resample Mn samples of with weights to obtain 

- Propagate m = I,..., Mn- 

- Compute the non-normalized weights = p{yt\x[^^), m = 1,..., Mn. 

- Normalize the weights to obtain w^'^\ m = 1,..., Mn- 

b) Fictitious observations: 

- Draw ~ (yt\yt-i), k = 1,... ,K. 

- Compute aK,M,t = ^k,M 7 i-^-. the position of yt within the set of ordered fictitious observations 

c) If f = nW, (assessment of convergence): 

- Compute the Xt statistic over the empirical distribution of St = {o^k m ti M t-i? •••? M t—w+i} ^s 


2 _ ’sp (Qj — Ej)^ 

P], ’ 

where Oj is the frequency of the observations in the window being in the jfth relative position, i.e., Oj = |ax m t C tSt : m t 

and Ej is the expected frequency under the null hypothesis, i.e., Ej = W ■ QkU) = 0)- 

- Calculate the p-value p*^^ ^ by comparing the statistic x? to the x^ "distribution with K degrees of freedom. 

- If PK^t < PI 

increase Mn = min{/up(M„_i), Mmax}- 

- Else, if j > ph, 

decrease Mn = max{/do„n(M„_i), 


( 10 ) 
= il 


- Else, 


Mn — Mn — 1 ■ 
- Set n = n + 1. 


d) If f < Wn, set i = t + 1 and go to 2. Otherwise, end. 


V. Numerical example 

A. The three-dimensional Lorenz system 

In this section we show computer simulation results that demonstrate the performance of the proposed method. We consider 
the problem of tracking the state of a three-dimensional Lorenz system ll28l with additive dynamical noise, partial observations 
and additive measurement noise Il29l . Namely, we consider a three-dimensional stochastic process {X(s)}sg(o,oo) taking values 
on whose dynamics are described by the system of stochastic differential equations 

dXi = -s{Xi-Yi) + dWi, 
dX2 = rXi - X2 - X1X3 + dW2, 
dX 3 = X1X2 - bXs + dWs, 




II 


where ( 0 , 00)7 ^ = 1,2,3, are independent one-dimensional Wiener processes and 

(s, r, b) = ^10,28,^^ 

are static model parameters broadly used in the literature since they lead to a chaotic behavior 1^ . Here we use a discrete-time 
version of the latter system using an Euler-Maruyama scheme with integration step A = 10“^, which yields the model 


A^l,n = — As(Ai_„_i — A2,n-l) + 

X2,n = + A(rAi_„_i — A2,„_1 — Ai^„_iA3_„_i) -f •\/AC/2.n) 

^3,n = + A(Ai_„_iA2,n-l — bA3_„_i) -f \/ 


( 11 ) 

( 12 ) 

(13) 


where {C/i,n}n=o,i,...j * = 1)2,3, are independent sequences of i.i.d. normal random variables with zero mean and unit 
variance. The system ([n)-([T3} is partially observed every 200 discrete-time steps. Specifically, we collect a sequence of scalar 
observations {yt}t=i, 2 ,...) of the form 

Yt = Xi,2oot + Vu (14) 

where the observation noise {Vt}t=i. 2 ,... is a sequence of i.i.d. normal random variables with zero mean and variance cr^ = 4. 

Let Xn = X 2 ,„, A 3 _„) G be the state vector. The dynamic model given by Eqs. ([n])-([T 3 defines the transition 

kernel p{Xn\Xn-i) and the observation model of Eq. ( [T4l l is the likelihood function 

p(j/t|a;i,2oot) oc exp ivt - xi,2ootf 

The goal is on tracking the sequence of joint posterior probability measures tt^, f = 1, 2,..., for {Xt}t=i,..., where Xt = A 2 oot- 
Note that one can draw a sample Xt = Xt conditional on Xt_i = Xt-i by successively simulating 


Xn ^ p{Xn\Xn-i), n = 200(f - 1) -f 1, ..., 200f, 


where JC 2 oo(t-i) = it-i and Xt = X 2 oot- The prior measure for the state variables is normal, namely 

Zo ^ N{x^,vllz), 

where x^ = (—5.9165; —5.5233; 24.5723) is the mean and is the covariance matrix of Zq , with Vq = 10 and I 3 being 
the three-dimensional identity matrix. 


B. Simulation setup 

With this example, we aim at showing how the proposed algorithm allows to operate the particle filter with a prescribed 
performance-to-computational-budget tradeoff. With this purpose, we applied a standard BE for tracking the sequence of 
posterior probability measures of the system system (|TT]i-([T3]l generated by the three-dimensional Lorenz model described in 


Section V-A We generated a sequence of T = 2000 synthetic observations, {yt',t = 1,...,2000}, spread over an interval 
of 400 seconds (in continuous time), corresponding to 4 x 10® discrete time steps in the Euler-Maruyama scheme (hence, 
one observation every 200 steps). Since the time scale of the discrete time approximation of Eqs. (|TT]i-(|T3]l is n = 200f, a 
resampling step is taken every 200 steps of the underlying discrete-time system. 

We started running the PE with a sufficiently large number of particles, namely N = 5000, and then let the proposed 
algorithm decrease the number of particles to attain a prescribed point in the performance-to-computation-cost range. 
This point is controlled by the operation range of the p-value, which is in turn driven by the pair of significance 
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levels \p(, — ph\- We tested the algorithm for different ranges of p-values, namely, pi £ {0.5,0.4,0.3,0.2,0.1,0.05} and 
Ph £ {0.9, 0.8,0.7, 0.6,0.5, 0.4,0.3, 0.2,0.1}. When the p-value is below p^, the algorithm doubles the number of particles 
Mn+i = fup{Mn) = 2Mn, and when the p-value is over ph, the number of particles is halved, Mn+i = fiom{Mn) = M„/2. 
We used K = 7 fictitious observations and a window of size W = 20. 

In order to assess the approximation errors, we computed the empirical MSEs of the approximation of the posterior mean, 
E[Xt\Yi.t = yi:t], by averaging the MSEs for the whole sequences. Note that, since the actual expectation cannot be computed 
in closed form for this system, we used the true underlying sequence {X 2 oot}t=i, 2 ,... as the ground truth. 


C. Numerical results 


Table III shows results of the MSE of the approximation of the posterior mean, the average number of particles 

„ T 


^=T ^ 


(15) 


the p-values of the test, and the Hellinger distance ll30l between the empirical distribution of St and the uniform distribution. 
They were obtained by averaging over 100 runs and averaging over time steps for each run. The initial number of particles 
Mq = 2^®, and the minimum and maximum number of particles are = 2® and M,nax = 2^®, respectively. The first half 
of time steps were discarded for obtaining the displayed results in order to test the behavior of the algorithm for different sets 
of parameters (see Eq. (HD). Regarding the relation between the MSE and M and the p-values, it can be seen that selecting 
a high operation range yields good performance (low MSE) at the cost of using a large number of particles (high M). When 
we decrease the range of p-values, the algorithm decreases the number of particles, increasing also the approximation error. 
Table shows that this conclusion holds for any pair of [pi — ph] ■ 

Eigure shows the MSE, the number of particles M, and the execution time for the different operation ranges (solid blue 
line) compared to the particle filter with a fixed number of particles M = 2 ^® (dashed red line). It can be seen that with a 
moderate operation range {[p£ — Ph\ = [0.3 — 0.7]), the algorithm can perform (in terms of MSE) similarly to the case with 
fixed M, while reducing the execution time approximately by a factor of four. The execution time can be further reduced by 
decreasing the operation range, although this worsens the performance. 

Eigure 1^ displays the evolution of the number of particles over time (averaged over 100 runs) for [pe — ph] = [0.3 — 0.7] 
both when Mq = 5000 and Mq = 10. In this case, the minimum and maximum number of particles are M^in = 10 and 
Afmax = 5000, respectively. We see that, after some time, the number of particles adjusted by the algorithm does not depend 


on Mq. 

Eigure shows the same behavior for [pi — ph] = [0.2 — 0.6]. After some time, the filter uses less particles than the filter 
with results in Eig. because the selected range of thresholds employs smaller p-values. 

Eigure shows histograms of averaged MSE and M for simulations performed with two different sets of thresholds: 
[Pe — Ph] = [0.3 — 0.5] and [p£ — p/^] = [0.5 — 0.7]. In both cases, the initial number of particles is Mq = 5000. It can be 
seen that a more demanding pair of thresholds ([p^ — ph] = [0.5 — 0.7]) leads to better performance and a larger average 
number of particles. This behavior can also be seen in Eigure]^ where the MSE w.r.t. the number of particles is displayed for 
three different sets of thresholds. Note that a filter with a too relaxed set of thresholds ([p^ — ph] = [0.05 — 0.4]) uses very 
few particles but obtains a poor performance, while a filter with the most stringent set of thresholds {\pi — ph] = [0.5 — 0.9]) 
consistently yields a low MSE, at the expense of using a larger number of particles. 
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\PI - Ph\ 

Fixed M = 2^^ 

[0.4- 0.8] 

[0.35 - 0.7] 

[0.3 - 0.7] 

[0.25 - 0.65] 

[0.2 - 0.6] 

MSE 

1.5193 

1.5234 

1.5240 

1.5287 

3.7552 

4.6540 

M 

32768 

24951 

14840 

8729 

2197 

451 

p-val 

0.5108 

0.5089 

0.4902 

0.4815 

0.4872 

0.4785 

Hell, distance 

0.2312 

0.2355 

0.2493 

0.2462 

0.2476 

0.2521 

exec, time (s) 

6201 

5617 

3014 

1532 

131 

67 

time ratio 

1 

1.10 

2.1 

4.05 

47.43 

92.36 


TABLE III: Lorenz Model (Section 


V-Ai: A = 10-3, Tabs = 200A, 


0.5. Algorithm details: W = 20, K = 7, M„ax = 


^min = 2^. MSE in the approximation of the posterior mean, averaged number of particles M, averaged p-value, and averaged 


Hellinger distance. 



Eig. 1: Lorenz Model (Section V-Ai. MSE, number of particles M and execution time for different pairs of significance levels 
[Pi ~ Ph\ in solid blue line, and with a fixed number of particles M = 2^® in dashed red line. 


Pi = 0.3, = 0.7 



Eig. 2: Lorenz Model (Section V-Ai. Evolution of the number of particles adapted by the proposed algorithm when the initial 


number of particles Mq € {10, 5000}. The significance levels were set to pi = 0.3 and ph = 0.7. 
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P| = 0-2, p, = 0.6 



Fig. 3; Lorenz Model (Section V-Ai. Evolution of the number of particles adapted by the proposed algorithm when the initial 
number of particles Mq S {10, 5000}. The significance levels were set to pg = 0.2 and ph — 0.6. 



500 1000 1500 2000 2500 3000 3500 4000 4500 
M 



500 1000 1500 2000 2500 3000 3500 4000 4500 
M 



Fig. 4: Lorenz Model (Section 


V-Ai. Histograms of averaged MSE and M with [pi—ph] = [0.3 —0.5] and [pi—ph] = [0.5 —0.7]. 


In both cases, the initial number of particles Mq = 5000. 



Fig. 5: Lorenz Model (Section 


V-AI. MSE w.r.t. the averaged number of particles M for runs with different sets of thresholds. 
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VI. Conclusions 

In practice, the number of particles needed in a particle filter is usually determined in an ad hoc manner. Furthermore, 
this number is typically kept constant throughout tracking. In this paper, we have proposed a methodology for the online 
determination of the number of particles needed by the filter. The approach is based on assessing the convergence of the 
predictive distribution of the observations online. First we have proved, under standard assumptions, a novel convergence result 
on the approximation of this distribution. Then, we have proposed a method for adapting the number of particles based on 
the online assessment of the filter convergence. The proposed procedure is simple but not unique. One can develop a range of 
algorithms for adapting the number of particles using the proposed methodology. We have illustrated the performance of the 
suggested algorithm by computer simulations. 


Appendix A 
Proof of Theorem[T] 

Recall that the likelihood of Xt = Xt given the observation Yt = yt is denoted i.e., = piyMt)- For the 

sake of notational accuracy, we introduce the Markov transition kernel Tt{dxt\xt-i) that determines the dynamics of the state 
process. The correspondence with the notation in Section [n| is Tt{dxt\xt-i) = p{xt\xt-i)dxt, however all the results in this 
appendix (including Theorem are proved for the general case in which Tt does not necessarily have a density w.r.t. the 
Lebesgue measure. For notational coherence, we denote To(firo) = p{xo)dxo. 

The same as in Section [n| the integral of a function f : Z ^ R w.r.t. a measure a on the measurable space {B{Z),Z) is 
denoted (/, a) and the absolute supremum of / is written ||/|joo = supj,g 2 l/(2)l- The class of bounded functions is denoted 
B{Z) = {/ : Z —)■ K : ||/||oo < c»}. For p > 1, the Lp norm of a r.v. Z with associated probability measure ^{dz) is denoted 

iizii, ■.= E\\z\^]^ = l^j \zmdz)y, 

where E[-] denotes expectation. 

We start introducing some auxiliary results on the convergence of the approximate measure and integrals of the form 
This leads to the core result, which is the uniform convergence of p^iy^) —> Pt(yt) on n sequence of compact 
sets. The proof of Theorem [T] follows readily from the latter result. 

The analysis in this Appendix draws from methods developed in BTl for the estimation of the filter pdf p{xt\yi.t) using 
kernel functions, which herein are suitably adapted to the problem of approximating the predictive density ptiyt)- 

Lemma 1. Assume that the sequence yQ.^, for T < oo, is arbitrary but fixed, and, for each f = 1, 2,..., T, S B{X) and 
ff'f > 0. Then, there exist constants Ct < oo, t = 0, 1, ...,T, independent of M such that 

ll(/,^f)-(/,6)llp< t = 0 , 1 , 2 ,... 

for every f S B{X). 

Proof: This is a particular case of Lemma 1]. □ 


Lemma 2. Assume that the sequence yQ.^_i, for t < oo, is arbitrary but fixed. If assumptions (£) and (T>) hold, then for each 
p > 1 there exists a constant Ct < oo independent of M such that 


E 


Dlpf^iyt)- Dlpt(yt)\ 


< 


iL 

Mf 


( 16 ) 
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Proof: We first note that DyPtiyf) = {Dy(f^,^t) and DyP^^iy^) = {Dygl\^^), where function Dy(f^{xt) is bounded (for 
any Jt) because of assumption (J)). Since (£) also holds, Lemma [T] yields 


Dlpt(yt)\\p = 


< 


Ct 


Vm’ 


(17) 


where the constant ct = Ct\\Dygt\\oo is finite (see (D)) and independent of M. If we raise both sides of ( [T7] i to power p, then 
we obtain the desired result ( [Thl l. □ 

Lemma 3. Let {0^}m>i be a sequence of non-negative r.v.’s such that, for every p > 4, 


E 




M\P 


< 


Mi 


(18) 


where c < oo and Q < v < 1 are constants independent of M. Then, for every e G (0, 5 ) there exists an a.s. finite rv. 
independent of M such that 


qM ^ 






Proof: Let us choose an arbitrary constant G {v, 1) and define the r.v. = X]m=i ^ If ( [T^ holds, then 

the expectation Ellf^’P] is finite, as we prove in the sequel. Indeed, from Patou’s lemma. 




M=1 


< 


v—'il) — ! 


(19) 


( 20 ) 


M=1 


where ( | 20 | l follows from substituting ( [T 8 ] l into Since we have chosen tjj G {v, 1), then it follows that —l<v — y<{) 
and V — tp — 1 < —1, which ensures that X]m=i < 00 and, therefore, E < 00 . Since E < 00 , then 

< 00 a.s. 

For any given value of M, it is apparent from the dehnition of [7’^’P that 


and, as a consequence. 




qM < 




— . 1 1 +’/’ 

M2 V 


( 21 ) 


Ms-" 

where the equality in ( | 2 T] i follows from defining e := and C/" := Since ^ < 1 , it is sufficient to choose p > 4 

to ensure that e = < 2. Also, since p can actually be chosen as large as we wish, it follows that (| 2 T| holds for e > 0 as 

small as needed. □ 

Lemma 4. Assume that the sequence jQ-y, for T < 00 , is arbitrary but fixed, and, for t = 1,2,..., T, G B{X) and (ffi > 0. 
Then, for every 0 < e < ^ (arbitrarily small) there exist a.s. finite r.v.’s Up < 00 , t = 0,1,..., T, independent of M such that 

Up 

■ — - (22) 


ii(/,cr)-{/,6)iip< 


f = 0,l,2,... 

Ms-" 


for every f G B{X). 

Proof: From Lemma [T| for each f = 1, ...,T, there is a constant c* independent of M such that 

E[\{f,^n-if,^tW]<^^ 
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for any / G B{X). Therefore, we can apply Lemmawith c = cf ||/||^ and :/ = 0, to obtain the desired inequality ( |22| ). □ 

For the statement of the next result, we need to recall the definition of the sequence of hypercubes 

la IJ P P 

M‘^y’= , M'‘yy M'^yy, 

Cm := [-+^-] X • • • X I-+^“1 ^ ® ” 

in assumption (£), where p > 4 and 0 < /3 < 1 are constants w.r.t. M. 

Lemma 5. Let the sequence y^.rp, T < oo, be arbitrary but fixed and assume that (£) and (Ti) hold. Then, for any 0 < e < ^ 
and each t = 1, 2,T there exists an a.s. finite r.v. Vfi independent of M such that 

Vf 


sup \priy)-Ptiy)\< 


y^CM 


Mi 


(23) 


In particular, 


lim sup \pt (j)-pt(j)|=0 a.s. 
M^°OyeCM 


Proof: Let Bm = ^M'^yy, in such a way that the hypercube Cm can be written as Cm = [—C For any 
y = [j/i, 2 / 2 ; ■ • • I VdyY ^ Cm and any function / : M.'^v —r K continuous, bounded and differentiable, one can write 

/ VI fVdy fO M 

■ Dlf{z)dz- •••/ Dlf{z)dz. 

-bM ^—bM —bM —bM 

In particular, if y G [—(>m, and assumption (D) holds, then we can write 

/ bM pbM 

■ {Dlp^iy) - Dlptiy)) dy + (pf (0) - pt(0)) 

-bM ^—bM 

and, as a consequence, 

/ bM pbM 

■■■ \Dlpf^{y)-Dlptiy)\dy + \p^{0)-pt{0)\ 

-t>M V—Vm 

which, in turn, yields the inequality 


where 


An application of Jensen’s inequality yields, for p > 1, 


sup 

pf Cy) ■ 




pbM 


= / 


—bM 


\M , \M/ 


Dlp'iiy)- Dlpt(y)\dy. 


1 




-A 


m 


< 


1 


P—bM J — bM 


Olpf^(y) - Dlpt(y)\ dy, 


which leads to 


. ^ pOM pOM 

{A^Y < X / ■■■/ \DlpYiy)-Dlpt(yfdy. 

J — bM 'J — bn/T 


Since, from Lemma 


E 


\Dlpf(y)-Dlpt(y)\ 


< —— 
~ M2 ’ 


(24) 


(25) 


( 26 ) 


ndyPjfiyP^ 

E [[A^f] < 


MS 


MS-/3^ 


we can combine ([26ll and ([25|) to arrive at 













18 


where the equality follows from the relationship If we apply Lemma with 9^ = , p > A, v = (3 and 

c = cf, then we obtain a constant S ( ^) (see (| 2 T|) and a non-negative and a.s. finite random variable both of 


them independent of M, such that 


A^< 


yAsi 


(27) 


Moreover, Lemma yields 


E 


pr(y)-pt{y) 


= E 
< 




cfllg^llgo 

Mi '' 


where Cj < oo is a constant independent of M and |( 7 ^||oo < oo independently of j from assumption (£). Therefore, we can 

apply Lemma again, with 9^^ = |pj^(0) — P((0)|, p > 4, v = 0 and c = cf to obtain the inequality 

ypt(o).£2 


|pf(0 )-p*(0)| < 




(28) 


where 62 € is a constant and I/P*(o),e 2 ^ non-negative and a.s. finite r.v., both of them independent of M. 

If we choose e = Si = S 2 € define + ypt(o),e 2 ^ (jjgjj j-jjg combination of Eqs. ( |24| ), ( |Z7] i and 


yields 


sup \pf ij) -Pt{y)\ < 


VP 


ysCM 


Mi- 


where is a.s. finite. Note that V^^ and e are independent of M. Moreover, we can choose p as large as we wish and fi > Q 
as small as needed, hence we can effectively select e S ( 0 , 7). □ 

Before stating the next partial result, let us recall assumption (£) again, namely the inequality ptiCju) < bM~^, where 
b > 0 and p < 1 are constants w.r.t M and Cm is the complement of Cm- 

Lemma 6. Let the sequence Jq-^, T < 00 , be arbitrary but fixed and assume that (£,}, (T>) and (€) hold. Then, for any 
0 < e < ^ and each f = 1,2, ...,T there exists an a.s. finite r.v. Wf independent of M such that 

W! 


(29) 


Proof: We start with a trivial decomposition of the integrated absolute error, 

[ \pt^iy)-Pt(y)\dy = f \pf^{y)-pt(y)\dy+ [\pf^(y)-pt(y)\dy 

J J Cm Cm 

< [ \pf‘(y)-Pt(y)\dy + 2 [pt{y)dy [{pf^iy)-pt(y))dy, 

J Cm ^ Cm ^ Cm 

where the equality follows from Cm Pi Cm = and the inequality is obtained from the fact that pt and pf^ are non-negative, 
hence \pf^(y) — Pt(y)\ < Pt^iy) EPtiy)- Moreover, if we realize that 

/ (pf Cv) - Ptiy)) dy = 1- f pf^(y)dy - 1 + / Pt(y)dy 

Cm Cm ^ Cm 


' Cm 


Cm 

{ptiy) - pT iy)) dy 


then it is straightforward to see that 

({pT (y) - Ptiy)) dy < ( \p^{y)-Ptiy)\dy 
J Cm Cm 

and, as a consequence, substituting ( [30l l into 

( Wiy)-Ptiy)\dy <2/ \p^iy)-Ptiy)\dy+ 2 [pt{y)dy 

J J Cm 'J Cm 


(30) 


( 31 ) 
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The first term on the right-hand side of ( |3T| l can be bounded easily because Cm is compact, namely 

/ \pf‘iy)-Ptiy)\dx<C{CM)s\rp\pf‘{y)-ptiy)\, (32) 

JCm y^CM 

where C{Cm) = = Mp is the Lebesgue measure of Cm- From Lemmathe supremum in ( |3^ can be bounded as 

sup^gj;;;^ \Pt^iy) —pt{y)\ < where > 0 is an a.s. finite r.v. and < ei < ^ is a constant, both independent 

of M. Therefore, the inequality ( |3^ can be extended to yield 

f I M, ^ 

/ \Pt(y)-Pt(y)\dy< — 

■ICm 


MS- 


Mi- 


(33) 


where e = + p and = F/L If we choose ei < \ — p, then e € ^ . Note that, for /3 < 1 and choosing p > 6, 

^ — p — ^ i ~ I ^ hence both ei and e are well defined. Now, taking p large enough we can effectively select 

e S (0, ^). 

For the second integral in Eq. ( |3T| ), note that j-^Pt{y)dy = pt{CM) and, therefore, it can be bounded directly from the 
assumptions in the present Lemma, i.e.. 


' Cm 


Pt{y)dy < 2bM-^, 


(34) 


□ 


where b > 0 and p > 0 are constant w.r.t. M. Putting together Eqs. and ( [T4l i yields the desired result, with 

W! = 2(1// yh) <oo a.s. 

Einally, the proof of Theorem [T] is a straightforward application of Lemma 
Proof of Theorem 1. We first note that, for any bounded function h, 

{h,P^) - {h,pt) = J h{y)pf^{y)dy- J h{y)pt{y)dy 
= J Hy){pf^{y)-pt{y))dy, 

hence, trivially, 

|(^,Aif) - < Halloo J \pf^{y)-ptiy)\dy. 

If we apply Lemma on the right hand side of ( |T5] l then we readily obtain 

\{h,pf^) - {h,pt)\ < II 


(35) 






(36) 


where e S (0, is an arbitrarily small constant independent of M and W/ = 
of M. 


3IF/ is an a.s. finite r.v., also independent 

□ 


Appendix B 
Proof of Theorem|2] 


Let Yt denote the (random) observation at time t. Assume, without loss of generality, that F = M. The probability measure 
associated to yt|Fi.(_i = yi-,t-i is Pt{dy) and, therefore, we can write the cumulative distribution function ofYi\Yi.t_i = yi-,t-i 
as Ft{z) = (/(-oo,z],Mt). where 


is the indicator function. Obviously, || Fa I loo 


lA[y) 

1 < oo 


_ I 1, if 2/ G ^ 

I 0 , otherwise 

independently of the set A and, therefore. Theorem yields 


lim F^{z) = Ft{z) a.s. 

M—foo 
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for any 2; G M, where Fl^{z) = (/(_oo.z]> is the approximation of the cdf of Yt\Yi.t-i = yi-t-i provided by the BE 
Assume the actual observation is Yt = yt and we draw K i.i.d. fictitious observations yl^\ ... ,y[^'^ from the distribution 
with cdf . Given Yt = yt is fixed, the probability that exactly n out of K of these samples are lesser than yt coincides 
with the probability to have n successes out of K trials for a binomial r.v. with parameter (i.e., success probability) F^ {yt), 
which can be written as 

h^{yt) = {F^{yt)T (1 - F^{yt)f-^ . 

By integrating {yt) over the predictive distribution of Yt, we obtain the probability to have exactly n fictitious observations, 
out of K, which are less than the r.v. Yt, i.e., the probability that AK,M,t = n is 

qK,M,t{n) = {h^,yit). ( 37 ) 

However, Theorem yields limM->oo((i^, = {h^, nt) a.sj^and, in particular, there exists a sequence of non-negative 
r.v.’s {eM}M>i such that limM-yoo em = 0 a.s. and 


{h^ ,^,r)-eM<{h^,yt)<{h^,^^r)+ SM 

for each M. Moreover, it is apparent that {h^,y,f^) = (see Proposition which, together with ( [J7| ) and 
desired relationship 

~ < QK,M,t{n) < + Em 


(38) 
yields the 


for every n G {0, 


□ 
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